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1.  INTRODUCTION  AND  SUMMARY 


1,1  INTRODUCTION 

v The  work  reported  on  here  has  been  exclusively  concerned  with  the  digital 
domain  operation  of  charge  coupled  devices.  One  common  example  of  the  digital 
use  of  charge  coupled  devices  Is  In  the  area  of  memory.  But  In  the  present 
context,  we  mean  a great  deal  more  than  just  memory.  Generally  speaking,  any 
digital  domain  function  can  be  accomplished  with  charge  coupled  devices;  this 
means.  In  particular,  digital  charge  coupled  logic  (DCCL)  functions  and  digital 
arithmetic  functions.  Putting  aside  for  the  moment  the  question  of  how  this  Is 
done,  let  us  first  ask  why  this  would  be  done.  After  all,  the  charge  coupled 
device  technology  produces  a unique  device  In  that  It  works  as  a sampled  data 
analog  system. 

In  view  of  the  fact  that  the  CCD  Is  unique  In  this  respect,  what  are  the 
advantages  of  using  the  device  In  the  digital  domain?  This  can  be  answered  by 
addressing  an  even  more  fundamental  question;  namely,  why  use  any  digital 
device?  The  reason  that  people  have  been  using  digital  devices  for  some  time 
can  be  summarized  In  a few  statements:  .) 

Freedom  from  parameter  variations* 

-)  • Freedom  from  environment  and  environment  changes* 

Til  Flexible  In  application; 
u*  Easily  programmable  ' 

• Arbitrary  accuracy  In  calculations  ; 

• Well  known  characteristics  that  are  easily  modeled  and  simulated'  . 

• Low  cost  due  to  widespread  usey 

The  above  reasons  are  traditional  In  explaining  the  acceptance  and  wide 
use  of  any  digital  device.  What  we  gain  when  we  use  a CCD  In  the  digital  domain 
Is  the  addition  of  two  other  highly  desirable  attributes  to  the  list.  The  CCD 
brings  with  It  low  power  requirements  and  high  functional  density  capability. 


This  marriage  of  CCD's  and  digital  technology  increases  the  general 


I list  of  digital  attributes  and  produces  a very  unique  combination  that  per- 

mits the  projection  of  devices  and  device  characteristics  that  are  otherwise 
unobtainable.  The  low  power  advantage  is  clearly  desirable  for  applications 
that  are  space  or  man-pack  related.  The  high  functional  density  capability 
is  exploited  in  any  situation  where  a large  amount  of  computation  is  re- 
quired to  perform  an  overall  system  function.  The  DCCL  unit  allows  the  de- 
signer to  place  a large  number  of  functions  on  a single  chip  thereby  elimin- 
ating interface  and  overhead  circuitry  and  significantly  reducing  the  overall 
chip  count. 

» 

So  far  we  have  stated  that  some  of  the  advantages  are  of  using  DCCL's. 

We  have  yet  to  address  the  question  of  how  these  devices  are  implemented. 


The  basic  DCCL  technology  has  an  obvious  application  in  binary  logic;  each 
storage  posit'  n either  has  charge  or  it  does  not  and  this  fact  represents  a 
one  or  a zero  just  as  in  a digital  memory.  Beyond  this  however,  we  can  ex- 
tend the  use  of  CCD's  to  perform  arbitrary  Boolean  algebra.  This  concept  is 
treated  in  detail  below. 

If  our  catalog  of  devices  includes  half-adders  and  full-adders  along  with 
logic  functions  such  as  AND's  and  OR's  then  we  can  implement  any  arbitrary 
logic  or  arithmetic  function.  There  is  one  additional  consideration  however. 
Due  to  the  operation  of  the  charge  coupled  device  whereby  charge  is  shifted 
at  each  clock  pulse,  we  do  not  have  ripple  through  logic  capability  but  rather 
must  implement  all  of  our  function  in  a pipeline  manner. 

The  reasons  that  pipeline  calculations  in  arithmetic  units  are  required 
is  associated  with  the  generation  of  the  carry  bit  at  each  stage.  For  example, 
in  the  addition  of  two  n-bit  words,  the  two  least  significant  bits  can  be  added 
immediately  and  produce  their  sum  and  carry  outputs.  The  carry  is  then  avail- 
able to  be  combined  with  the  next  significant  bits  and  produce  a new  sum  and 
carry.  In  this  manner  the  carry  is  delayed  during  each  operation  and  so  must 
the  application  of  the  next  significant  bits  be  delayed  by  an  equal  amount. 

This  requires  a set  of  delays  on  the  input  lines.  An  analogous  set  of  delays 
must  be  inserted  in  series  with  the  output  lines  in  order  that  the  entire  out- 
put word  is  available  at  one  clock  pulse  sometime  in  the  future.  It  is  not 
difficult  of  course  to  obtain  these  delays  in  the  CCD  structure  since  that  is 
the  most  natural  operation  for  the  device  to  perform.  It  does  require 
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additional  area  however,  and  in  general  leads  to  a larger  active  area  for 
the  function.  This  added  area  can  be  removed  in  large  scale  functions  where 
we  can  work  with  skewed  arithmetic. 

Working  with  skewed  arithmetic  means  simply  that  the  data  enteres  the 
chip  synchronously  in  time  and  passes  through  a set  of  delays  that  properly 
skews  all  of  the  bits.  Then  an  arithmetic  operation  (such  as  addition  or 
multiplication)  is  performed  and  the  data  is  then  shifted  on  to  another  oper- 
ation. This  technique  continues  until  all  of  the  operations  have  been  done 
on  the  data.  The  data  is  once  more  passed  through  a set  of  delays  that  re- 
synchronizes all  of  the  bits  so  that  they  are  available  at  the  output  pins 
at  one  point  in  time. 

All  of  this  means  that  we  can  eliminate  the  majority  of  the  delays  associ- 
ated with  the  arithmetic  operations  for  functions  performed  internal  to  the 
chip.  Only  the  initial  skewing  delay  and  the  final  deskewing  delay  are  required. 
All  the  while  the  data  is  on  the  chip  it  can  be  manipulated  in  a skewed  fashion. 
There  is  another  implication  of  using  pipeline  arithmetic.  Since  the  data 
enters  at  one  clock  pulse  and  exits  at  a clock  pulse  sometime  in  the  future, 
it  is  not  efficient  to  do  random  calculations  with  pipeline  techniques.  This 
means  that  this  technology  is  best  suited  for  signal  processing  functions  that 
operate  on  blocks  of  data  at  a time.  It  is  not  well  suited  to  random  calcula- 
tions that  occur  only  occasionally.  A large  number  of  algorithms  either  al- 
ready are  in  a pipeline  organization  or  can  be  cast  into  one,  so  that  the 
application  of  DCCL  is  in  no  way  truly  restricted. 

One  other  item  is  of  note  at  this  point:  the  throughput  rate  of  pipeline 
arithmetic  calculations  is  very  high.  The  data  enters  the  device  at  the  maxi- 
mum clock  rate  and  the  answer  exits  at  some  point  in  time  later  but  still  at 
the  maximum  clock  rate.  The  designer  must  therefore  only  account  for  the 
series  delay  that  is  necessarily  a part  of  the  pipeline  operation. 

1.2  HISTORY 

In  1973,  the  Naval  Research  Laboratory  issued  a request  for  quotation  for 
a study  program  aimed  at  defining  and  analyzing  those  areas  of  application  of 
charge  coupled  devices  (CCDs)  in  signal  processing  systems.  The  broad  objec- 
tive of  the  RFQ  was  to  initiate  a study  that  would  examine  the  impact  of  CCD 
technology  on  signal  processing  systems.  Implicit  in  such  a statement,  of 
course,  is  the  requirement  to  determine  those  areas  of  signal  processing 
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systems  where  the  use  of  CCDs  offers  an  economic  advantage.  The  extend  of 
that  advantage,  that  Is  to  say  the  Impact,  can  then  be  projected.  Naturally, 
the  projection  cannot  be  made  In  terms  of  dollars  and  cents,  but  Is  best  made 
by  direct  comparison  of  Identical  functions  realized  with  CCDs  and  any  other 
appropriate  technology.  Under  these  conditions,  numbers  such  as  speed,  power, 
and  parts  count  can  be  tabulated  and  cross-correlated. 

As  a result  of  the  proposal  submitted  to  the  Naval  Research  Laboratory, 

TRW  embarked  on  a study  of  the  Impact  on  signal  processing  systems  of  the  use 
of  CCDs  In  the  digital  domain.  The  results  of  that  study  have  been  issued 
under  the  title  "Charge  Coupled  Devices  in  Signal  Processing  Systems;  Volume  I: 
Digital  Signal  Processing".*  Briefly  stated,  the  study  indicated  that  digital 
CCDs  combine  the  inherent  advantages  of  any  digital  technology  (such  as  high 
noise  immunity,  freedom  from  device/parameter  variations,  stable  operating  con- 
ditions, and  ease  in  simulation)  with  the  advantages  peculiar  to  CCDs  (such  as 
high  density  and  low  power).  In  addition,  digital  CCDs  are  best  suited  to 
those  signal  processing  applications  where  the  signal  flow  can  be  carried  out 
in  a pipelining  fashion  requiring  little  or  no  feedback;  this  permits  rela- 
tively high  data  throughput  to  be  accomplished  with  the  relatively  low  CCD 
clock  frequencies.  Not  surprisingly,  the  impact  is  most  dramatic  in  those 
situations  where  a large  number  of  functions  and/or  high  computational  accuracy 
Is  demanded.  A large  number  of  such  instances  occur  In  existing  and  projected 
systems;  these  were  identified  and  analyzed  In  some  detail. 

At  the  conclusion  of  the  study,  TRW  recommended  that  an  experimental 
verification  be  carried  out  that  would  go  beyond  the  basic  device  work  already 
accomplished  and  would  demonstrate  the  real  advantages  of  the  approach.  The 
realization  of  a digital  CCD  fast  Fourier  transform  on  a chip  was  selected  as 
a useful  vehicle;  additionally  this  function,  properly  Implemented,  is  quite 
flexible  and  suited  to  a number  of  diverse  situations.  Accordingly,  a tech- 
nology development  program  was  begun.  The  objective  of  the  first  phase  was 
the  Investigation  and  characterization  of  the  fundamental  building  blocks  that 
would  be  employed  in  a typical  application.  The  results  of  this  Phase  1 pro- 
gram include  the  further  development  of  a full  adder  circuit  function;  the 
design  and  test  of  a 4 + 4 adder  and  a 3 x 3 multiplier  arrays;  and  a study 
made  to  determine  a method  of  interconnecting  a number  of  projected  FFT  chips 
into  a single  system.  These  results  have  been  issued  under  the  title  "Charge 

♦Av'a’iTabTe’Trom  the  National  Technical  Information  Services;  a companion 
report  "Charge  Coupled  Devices  in  Signal  Processing  Systems;  Volume  11: 

Analog  Signal  Processing"  is  also  available. 
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Coupled  Devices  in  Signal  Processing  Systems;  Volume  111;  Digital  Function 
Feasibility  Demonstration". 

The  objective  of  this  second  phase,  being  described  here,  was  to  develop 
large  computational  building  blocks  suitable  for  implementing  an  FFT  or  some 
other  similar  function.  Near  the  end  of  Phase  2,  a potential  application  in 
the  area  of  voice  processing  arose  which  would  ultimately  require  16-bit  arith- 
metic blocks,  i.e.,  a (16  x 16)  multiplier  and  a (32  + 32)  adder/subtractor.  At 
the  end  of  the  thirteenth  month  Phase  2 effort,  work  was  completed  on  8-bit  arith- 
metic block  designs.  Work  on  the  larger  blocks  continued  into  Phase  3,  beginning 
with  the  design  of  a (32  + 32)  adder/subtractor.  The  duration  of  the  third  phase 
is  dependent  on  the  final  application  selected.  The  chronology  of  events  is 
summarized  in  Figure  1-1. 


Figure  1-1.  Chronology  of  Program 


1.3  PHASE  1 REPORT  SUMMARY 

This  report  contains  an  overview  of  the  entire  program  and  a brief  state- 
ment of  goals  and  approaches.  This  is  followed  by  a discussion  of  the  develop- 
ment of  the  full-adder  circuit  function.  The  original  concept  is  explained  and 
subsequent  alternations  to  the  original  layout  are  described;  both  two  and  three 
input  adders  are  treated  (Section  2).  There  are  some  hardware  implications  in 
the  several  computational  algorithms  that  can  be  used  and  these  are  examined  in 
Section  3.  The  primary  test  mask  that  was  designed  during  Phase  1 is  presented 
along  with  a summary  of  the  test  results  in  Section  A.  The  process  sequences  being 
employed  to  produce  these  devices  are  explained,  and  cross-sectional  views  of  the 
devices  are  given  in  Section  5.  This  is  followed  by  a presentation  of  the  results 
of  a study  made  to  determine  a method  of  interconnecting  a number  of  the  projected 
FFT  chips  into  a single  system.  The  report  concludes  in  Section  7 with  a recommen- 
dation for  future  work. 
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1.4  PHASE  2 REPORT  SUMMARY 


This  report  contains  a commentary  on  the  advantages  of  digital  charge 
coupled  logic  (DCCL)  and  makes  a comparison  with  other  current  high  density/ 
low  power  LSI  technologies.  A description  of  the  basic  equations  necessary 
for  designing  DCCL  logic  gates  is  included  and  the  design  of  various  logic 
cells  and  arithmetic  functions  are  discussed  in  Sections  1,  2 and  3.  The 
principles  used  In  the  design  of  pipelined  multiplier  and  adder/subtractor 
arrays  are  discussed  in  Section  4.  The  clocking  schemes  and  test  results 
obtained  on  both  arithmetic  arrays  and  single  arithmetic  functions  are  des- 
cribed in  Sections  5 and  6.  In  Section  7,  we  describe  the  metal/polysillcon 
and  double  polysilicon  fabrication  processes  used.  The  report  concludes  with 
Section  8 in  which  recommendations  for  the  direction  of  future  work  are  made. 


) 


' 


2.  APPLICATION  OF  DIGITAL  CHARGE  COUPLED  DEVICES 
2.1  ADVANTAGES  OF  DCCL 

Our  previous  comments  in  Section  1.1,  regarding  the  low  power  require- 
ments and  high  functional  density  capabilities  of  the  CCD  technology  serve 
to  point  out  the  distinct  advantages  of  digital  charge  coupled  divlces  versus 
any  other  digital  technology.  It  is  more  Informative  in  the  present  context 
to  examine  the  advantages  of  digital  charge  coupled  devices  versus  analog 
charge  coupled  devices.  Perhaps  a good  starting  point  is  the  different  types 
of  signal  representation  used  for  each  implementation. 

The  analog  device  takes  one  sample  of  the  data  and  applys  a significance 
to  the  amplitude  of  that  sample;  the  digital  device  takes  one  sample  and 
quantizes  it  into  n bits  and  attaches  a significance  to  the  magnitude  of  n. 
Clearly  this  means  that  the  DCCL  requires  an  analog  to  digital  converter; 
this  is  not  much  of  a penalty  in  today's  systems  for  a great  number  of  systems 
exist  wherein  the  data  representation  is  already  in  digital  form. 

At  first  appearances  it  would  seem  likely  that  the  n-bits  per  sample  would 
require  much  more  silicon  area  to  perform  the  same  function  in  the  digital 
form  than  the  one  sample  an  analog  device  would  require.  This,  however,  is  not 
necessarily  the  case;  the  fact  that  we  must  maintain  an  acceptable  signal  to 
noise  ratio  in  the  operation  requires  us  to  utilize  quite  large  areas  for  each 
analog  packet  storage.  On  the  other  hand,  the  digital  device  can  use  extremely 
small  storage  elements  for  each  of  the  n samples. 

This  is  so  because  the  digital  device  has  an  inherently  better  noise  per- 
formance. In  the  analog  operation  any  change  in  sample  amplitude  is  a change 
in  signal  amplitude.  In  the  digital  device,  the  signal  can  change  by  quite  an 
amount  before  this  change  is  detected  by  the  thresholding  output  circuit.  In 
fact,  the  properties  of  this  output  circuit  provide  the  digital  implementation 
with  one  of  its  biggest  noise  margins.  This  circuit  need  only  detect  the  pre- 
sence of  a charge  packet  greater  than  a certain  amount  or  less  than  a certain 
amount  and  make  its  decision  based  on  that  information;  this  is  distinctly 
different  from  assigning  a significance  to  the  exact  amplitude  of  a charge 
packet  as  is  required  in  the  analog  operation. 

It  is  worthwhile  to  examine  this  question  of  the  output  circuit  a little 
further.  Figure  2-1  is  a representation  of  the  statistical  variation  to  be 
found  in  the  output  packets  present  at  the  end  of  a digital  operation.  We  note 
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that  we  get  a Gaussian  distribution  In  the  number  of  packets  for  both  a 
one  and  a zero  output.  This  Is  to  say,  there  is  some  charge  packet  size 
which  is  intended  to  represent  a one  bit  and  another  charge  packet  size 
which  is  Intended  to  represent  a zero  bit.  Due  to  the  various  noise  sources 
Inherent  In  the  device  the  exiting  charge  packets  will  not  all  be  of  exactly 
the  same  amplitude.  This  produces  the  Gaussian  distribution  shown. 


Now  the  output  circuits  need  only  distinguish  between  these  two  major 
distributions;  signals  larger  than  the  threshold  point  are  interpreted  as 
having  originated  from  a one  bit  and  signals  smaller  than  that  as  having 
originated  from  a zero  bit.  In  a statistical  sense,  the  output  circuit  in- 
evitably misinterprets  some  signals;  this  represents  the  bit  error  rate  of 
the  device,  and  is  generally  quite  a small  number.  This  operation  strongly 
contrasts  with  the  analog  output  operation  which  depends  upon  a very  linear 
input  to  output  function.  In  representing  signals  in  the  analog  domain,  it 
is  extremely  important  to  minimize  all  noise  sources  and  to  control  environ- 
mental conditions  as  far  as  is  possible.  This  is  true  because  any  loss  (or 
gain)  of  carriers  from  the  charge  packet  throughout  the  analog  system  amounts 
to  a corresponding  loss  (or  gain)  in  signal.  The  digital  representation  avoids 
these  problems  by  simply  assigning  two  values  to  the  charge  packet  (one  to  zero) 
and  placing  the  burden  of  distinction  on  the  output  circuit  where  the  distinc- 
tion is  easily  made. 
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In  addition  to  this,  the  effects  of  transfer  efficiency  differ  between 
the  two  types  of  devices.  The  device  modulation  transfer  function  (Mi F) 
greatly  influences  the  signal  representation;  therefore  the  transfer  effici- 
ency (through  the  MTF)  is  an  important  parameter  for  analog  implementation 
considerations.  In  the  digital  domain  however,  the  device  operation  is 
typically  Independent  of  the  transfer  efficiency  in  the  absolute  sense  and 
therefore  also  independent  of  transfer  efficiency  variation  from  device  to 
device  within  very  wide  limits. 

When  we  consider  the  effects  of  temperature  and  the  temperature  range 
over  which  various  devices  will  operate,  we  must  at  the  same  time  consider 
the  frequency  of  operation  of  these  devices.  This  is  a direct  result  of  the 
physics  involved  in  the  charge  coupled  device  technology.  Whether  the  signal 
is  in  digital  or  analog  form,  the  fact  of  the  matter  is  that  carriers  are 
collected  in  any  potential  minimum  that  exists  within  the  silicon;  these 
carriers  are  generated  by  dark  currents  and  their  total  quantity  is  also  a 
function  of  how  long  that  potential  minimum  exists  within  the  silicon.  This 
means  that  the  temperature  range  is  inexorably  tied  up  with  the  frequency  of 
operation  since  the  dark  current  is  a function  of  temperature  and  the  time  a 
potential  minimum  exists  is  a function  of  frequency. 

While  this  effect  is  identical  in  both  digital  and  analog  operation,  the 
consequences  of  It  are  drastically  different  for  the  two  cases.  As  we  have 
explained,  any  accumulation  of  carriers  is  an  apparent  increase  in  signal  for 
the  analog  representation.  The  digital  device  however,  can  accumulate  carriers 
up  to  the  point  that  the  threshold  of  the  output  circuit  will  misinterpret  a 
previously  designated  zero  bit  as  a one  bit.  As  a result  quite  a large  number 
of  carriers  can  be  accumulated  before  the  threshold  circuit  provides  an  In- 
correct result. 

This  means  that  the  digital  device  operation  is  not  adversely  affected 
by  an  increase  in  temperature  up  to  the  point  that  the  threshold  circuit  ceases 
to  operate  properly.  Beyond  that  point,  the  device  consistently  makes  errors. 
Therefore  the  range  of  temperature  over  which  a CCD  can  operate  in  a digital 
mode  is  quite  large  and  very  predictable. 

This  temperature  range  is  not  independent  of  the  frequency  range  as  we 
have  just  pointed  out.  This  means  that  at  any  given  temperature  of  operation 
there  is  a minimum  frequency  of  operation  for  proper  device  characteristics . 

Or  viewed  another  way,  at  any  given  frequency  there  is  a maximum  temperature 
at  which  the  device  will  operate. 
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In  connection  with  temperature  considerations,  it  is  important  to 
comment  on  the  placement  of  refresh  circuits  throughout  the  digital  function. 
These  circuits  are  required  quite  naturally  at  several  points  in  most  digital 
operations.  Consider,  for  example,  what  occurs  when  the  results  are  available 
from  a multiplier;  it  is  conceivable  that  the  output  of  the  multiplier  may  go 
directly  to  the  output  pins  of  the  device  for  use  somewhere  else  in  the  system, 
or  the  output  of  the  multiplier  may  be  inserted  into  some  other  on-chip  function. 
Since  branching  is  required  to  do  this,  it  is  a natural  place  to  put  a charge  to 
voltage  conversion  circuit  that  will  refresh  the  charge  and  be  capaole  of  dis- 
tributing it  to  several  different  places.  This  natural  occurance  of  refresh 
circuits  reduces  the  time  that  any  potential  minimum  is  required  to  exist  in  the 
silicon.  This  means  that  the  temperature  range  over  which  the  device  will  oper- 
ate is  generally  quite  large;  operation  up  to  + 135°C  is  not  uncommon. 

Radiation  effects  are  quite  similar  to  temperature  effects  in  certain 
respects.  Radiation  generally  does  two  things  to  harm  the  CCD  operation:  it 
increases  the  general  level  of  dark  current;  and  it  changes  the  device  thresh- 
holds.  The  increase  in  dark  current  can  be  viewed  in  the  same  way  as  an  in- 
crease in  temperature.  The  change  in  threshold  however  is  a different  kind  of 
effect.  The  CCD  has  some  margin  to  threshold  variations  due  to  its  inherent 
design.  For  large  radiation  doses  the  threshold  shift  is  quite  large  and  an 
adjustment  of  the  bias  voltages  is  generally  required  in  order  to  maintain 
acceptable  overall  device  characteristics.  In  general,  the  digital  CCD  opera- 
tion is  quite  immune  to  most  of  these  variations;  within  its  range  of  noise 
margin,  the  device  can  accept  changes  in  threshold  and  increase  in  dark  current 
without  changing  any  of  its  operational  characteristics. 

The  various  types  of  functions  that  are  achievable  in  the  analog  and  the 
digital  domain  differ  also.  The  digital  representation  is  capable  of  performing 
logic  functions  in  addition  to  arithmetic  functions.  The  analog  device  can 
provide  multiplication  and  addition.  There  is  a distinct  difference  in  the 
arithmetic  functions  provided  by  each  type  of  device.  The  digital  representa- 
tion has  an  accuracy  in  its  calculation  that  is  limited  only  by  the  input  signal 
quantization.  If  the  input  signal  is  quantized  to  8-bits  then  the  calculation 
will  be  accurate  to  8-bits.  In  the  analog  domain  the  accuracy  is  affected  by 
the  device  parameters  because  of  transfer  efficiency  and  also  by  the  linearity 
of  the  input-output  circuitry.  In  addition,  analog  multiplication  is  also 
affected  by  the  weighting  and  tapping  scheme  used  in  the  multiplier.  The  digi- 
tal multiplier  is  accurate  to  the  number  of  bits  in  its  representation.  This 
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tapping  and  weighting  error  problem  has  received  a great  deal  of  attention 
from  various  workers  in  the  analog  signal  processing  field. 

One  other  item  should  be  reiterated  at  this  point.  In  the  digital  repre- 
sentation, arithmetic  functions  are  performed  with  pipeline  techniques.  This 
means  that  the  two  n-bit  words  representing,  for  example,  a multiplier  and  a 
multiplican  are  both  accepted  Into  the  arithmetic  function  on  one  clock  pulse; 
they  are  then  clocked  through  the  function  with  succeeding  pulses  and  eventu- 
ally the  2n  product  bits  imerge  all  at  one  clock  pulse.  After  the  first  multi- 
plier and  multiplican  have  been  clocked  into  the  function,  a second  set  which 
is  completely  independent  of  the  first  can  be  accepted  on  the  next  clock  pulse. 

A third  set  can  be  accepted  on  the  third  clock  pulse  and  so  on.  All  of  the 
intermediate  products  are  shifted  along  and  kept  entirely  distinct  from  each 
other.  Finally,  at  the  output,  each  of  the  products  arrive  at  its  own  point 
in  time  and  all  the  output  bits  appear  simultanuously.  This  requires  a through- 
put delay  between  the  time  the  multiplier  and  multiplican  are  first  introduced 
and  the  time  that  their  result  emerges  as  a product.  But  succeeding  products 
come  out  on  every  clock  pulse.  This  is  distinctly  different  from  operation  in 
the  analog  domain  where  such  multiplications  can  occur  within  one  given  clock 
period  and  there  is  no  throughput  delay. 

In  Table  1,  we  have  summarized  a number  of  the  salient  features  of  our 
comparison  between  digital  charge  coupled  devices  and  analog  charge  coupled 
devices. 

Table  1.  Digital  CCD  and  Analog  CCD  Comparison 
Item  Digital  Analog 


Signal 

Representation 


Input/Output 

Circuitry 


Transfer  Efficiency 
effects 


n-bits  per  sample  of  input; 
requires  handling  n charge 
packets,  but  each  packet  is 
very  small;  an  n-bit  A/D  is 
required. 

Simple,  two-state,  thres- 
holding circuits;  signal 
accuracy  unaffected  within 
large  noise  margins. 

Device  operation  typically 
independent  of  transfer 
efficiency  (greater  than  0.95 
per  transfer  insures  proper 
operation) . 


One  sample  per  sample  of 
input;  requires  only  one 
charge  packet,  but  that 
packet  must  be  large  to 
maintain  an  acceptable 
S/N;  no  A/D  is  required. 

Linear  circuits  needed; 
signal  accuracy  a direct 
function  of  I/O  circuit 
performance. 

The  well-known  device  modu 
lation  transfer  function 
(MFT)  shows  that  any  trans 
fer  loss  is  a signal  de- 
gradation; both  amplitude 
and  phase  effects  are  seen 


Table  1.  Digital  CCD  and  Analog  CCD  Comparison  (continued) 


Item 


Digital 


Analog 


Temperature 

Range 


Maximum  Frequency 


Functions  Achievable 


Due  to  simple  thresholding 
output  circuit,  dark  current 
can  accumulate  right  to  the 
threshold  without  affecting 
the  signal;  a very  large 
temperature  range  results 
over  which  operation  is 
totally  unaffected. 

Limited  by  transfer  effic- 
iency at  the  point  the  thres- 
holding circuit  is  effected, 
thus  a higher  frequency  is 
achievable. 


Any  dark  current  accumula- 
tion is  measured  as  signal 
and  degrades  the  S/N ; 
special  design  precautions 
are  required  to  minimize 
this  effect,  but  it  cannot 
be  eliminated. 

Limited  by  transfer  effic- 
iency effects. 


Logic 

Addition 


Multipl ication 


Subtraction 


Yes 

Yes,  with  an  accuracy 
limited  only  by  the  input 
signal  quantization. 

Yes,  with  an  accuracy  limited 
only  by  the  input  signal 
quantization. 


Yes,  with  an  accuracy 
limited  only  by  the  input 
signal  quantization. 


No 

Yes,  with  an  accuracy 
limited  by  device  parameters 
(such  as  transfer  efficiency) 
and  the  I/O  linearity. 

Yes,  with  an  accuracy  limited 
by  device  parameters  (trans- 
fer efficiency),  I/O  circuit 
linearity  and  more  importantly, 
the  tapping  and  weighting 
scheme  used. 

No 


NOTE: 


In  performing  arithmetic 
functions,  pipeline  techniques 
are  used;  thus  after  a 
throughput  delay,  results 
are  available  at  each  clock 
pulse. 


In  performing  arithmetic 
functions,  the  operations 
occur  within  a clock  period 
and  results  are  available  at 
each  clock  pulse;  no  through 
put  delay  is  required. 


2.2  COMPARISON  OF  DCCL  WITH  OTHER  LSI  TECHNOLOGIES * 

Almost  all  previous  power  dissipation  comparisons  between  different  digital 
techniques  have  been  made  at  a single  gate  level;  this  is  meaningless  in  a DCCL 
application  so  we  have  chosen  to  make  the  comparisions  first  on  a full -adder 
logic  cell  and  then  on  large  arithmetic  arrays. 

* R.  A.  Allen,  R.  J.  Handy,  J.  E.  Sandor,  "Charge  Coupled  Devices  in  Digital 
LSI",  1976  International  Electron  Devices  Meeting,  December  1976.  The  circuits 
used  for  this  comparison  of  technologies  were  described  in  this  paper. 
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2.2.1  DCCL  Full-Adder 
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DCCL  does  not  require  any  dc  current,  but  current  digital  functions  such 
as  full-adders  and  regeneration  cells  that  utilize  a floating-gate  require 
three  clock  phases.  The  comparison  done  here  uses  the  current  technique  of 
employing  the  three  input  full-adder  implementation  designed  with  5pm  minimum 
geometry.  The  area  under  each  clock  line  of  the  full -adder  was  measured  and 
the  capacitance  calculated  for  each  of  the  different  silicon  oxide  thicknesses. 

The  difference  in  clock  levels  required  for  each  clock  phase  was  used 
with  the  calculated  capacitance  to  determine  the  power  dissipation,  CV2f. 

This  resulted  in  a total  power  dissipation  of  29.6pW  at  a clock  frequency  of 
1MHz.  Since  the  power  dissipation  is  a linear  function  of  frequency,  the 
characteristic  results  in  a straight  line  as  shown  in  Figure  2-2. 


10  KHz  100  KHz  1 MHz  10  MHz  100  MHz 


Figure  2-2.  Power  Dissipation  versus  Clock  Frequency  for 

Full -Adders  constructed  from  various  Semiconductor 
Technologies 


ft 


2.2.2  CMOS  Full -Adder 

The  CMOS  full-adder  used  in  this  comparison  contained  28  devices  and  was 
designed  with  5pm  minimum  geometry  and  a 5 volt  supply.  It  dissipates  870pW 
at  a clock  frequency  of  1MHz.  The  power  dissipation  characteristic  for  CMOS  is 
linear  up  to  10MHz  and  then  typically  changes  to  a much  steeper  curve  as  shown 
in  Figure  2-2. 
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2.2.3  P-MOS  and  N-MOS  Full-Adders 


Typical  P-MOS  and  N-MOS  full-adder  schematics  containing  18  devices  are 
identical.  A 5um  minimum  geometry  p-channel  enhancement  mode  full-adder  with 
saturated  loads  will  dissipate  12mW  at  a clock  frequency  of  1MHz  and  lOOmW  at 
10MHz. 

Because  the  minority  carrier  mobility  of  n-channel  devices  is  twice  that 
obtained  with  p-channel  units,  the  gates  can  be  made  smaller  and  a 5um  minimum 
geometry  n-channel  full-adder  will  dissipate  4.8mW  at  1MHz  and  40mW  at  10MHz. 
The  power  dissipation  versus  clock  frequency  characteristics  for  P-MOS  and 
N-MOS  full-adders  are  shown  in  Figure  2-2. 

2.2.4  Integrated  Injection  Logic  Full-Adder 

A delay  power  product  of  0.5pJ  per  gate  for  experimental  I^L  devices  has 
been  reported  by  S.  Bruederle  and  P.  Smith*,  with  0.8pJ  a value  that  is  more 
commonly  achieved  in  production.  These  reported  figures  were  for  5vim  devices, 
and  we  will  assume  for  this  comparison  a delay-product  of  0.8pJ  per  inverter  at 
low  clock  frequencies. 

An  I^L  full-adder  requires  26  inverters.  The  total  delay-product  of  the 
full-adder  is  the  sum  of  the  delay-product  of  a single  inverter  multiplied  by 
the  number  of  inverters,  i.e.,  20.8pJ. 

There  are  a maximum  of  five  stage-delays  through  the  full -adder.  If  we 
assume  that  the  five  stage-delays,  ds,  can  be  contained  within  one  half  clock 
period,  then  the  full  clock  period  equals  10ds,  and  for  a clock  frequency  of 
1MHz,  the  power  dissipation  is: 

Ps  = 20.8pJ/0. lps  - 208uW 

The  power  dissipation  versus  clock  frequency  curve  for  an  I^L  full-adder 
is  shown  in  Figure  2-2.  It  should  be  noted  that  there  are  two  break  points  on 
the  curve;  one  at  50KHz  and  the  other  at  5MHz.  Below  50KHz,  the  power-dissipa- 
tion is  constant.  The  reason  for  this  is  the  common  emitter  current  gain  of  a 
npn  transistor  used  in  an  I 2l  configuration  falls  to  below  4 at  UiA. 

♦Designing  with  I^Lh,  Stan  Bruederle  and  Philip  Smith,  19/2  Westcon  Aug  1975 
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Assuming  that  in  a combinational  logic  circuit,  half  the  inverters  are 
not  conducting  and  half  have  a collector  current  of  lpA  and  pnp  emitter  cur- 
rent of  0.5pA  at  a supply  of  0.8  volts.  The  average  power  dissipation  of 
such  an  inverter  is  0.6yW.  This  results  in  a low  frequency  power  dissipation 
of  15.6yW  for  the  full-adder. 

From  50KHz  to  5MHz  the  power  dissipation  of  an  I^L  circuit  is  limited  by 
the  base-emitter  and  intercircuit  capacitances  and  is  inversely  proportional 
to  supply  current.  At  clock  frequencies  above  5MHz,  the  speed  limitations 
are  due  primarily  to  constraints  imposed  by  the  storage  of  minority  carrier 
charges  in  the  npn  emitter  and  in  the  pnp  base.  Additional  limitations  are 
due  to  stray  parasitic  capacitances,  the  base  resistance  of  the  npn  transistor 
and  the  logic  function  implemented.  These  speed  limitations  of  conventional 
iZl  result  in  a 15MHz  maximum  operating  frequency. 

2.2.5  Power  Dissipation  Comparisons  in  Arithmetic  Arrays 

DCCL  Arrays 

When  a variety  of  systems  is  considered,  certain  functions  appear  repeat- 
edly: the  fast-Fourier  transform,  for  example,  requires  multipliers  and  adders; 
serial  correlators  require  shift-registers,  multipliers  and  accumulators;  fre- 
quency synthesizers  require  shift-registers  and  accumulators;  digital  differ- 
ential analyzers  use  adders  and  shift-registers  to  perform  integration;  and 
some  transforms  require  add  and  subtract  functions. 

Although  the  power  dissipation  of  a full-adder  is  useful  for  comparing 
various  digital  technologies,  a comparison  of  the  power-dissipation  of  arith- 
metic arrays  is  more  meaningful. 

A DCCL  array  requires  half-adders,  AND  gates,  charge  refresh  cells  and 
shift-register  delays  in  addition  to  full-adders.  Each  of  these  logic  cells 
were  treated  in  the  same  way  as  the  full  adder  discussed  above,  by  calculating 
the  capacitance  and  power  dissipation  of  each  clock  line. 

In  both  DCCL  adder  and  multiplier  arrays  in  which  a full-adder  is  imple- 
mented from  two  half-adders,  the  transfer  time  through  a full -adder  is  two 
clock  periods.  Consistent  with  our  current  usage  we  now  consider  the  delays, 
cell  count  and  packing  densities  when  a full-adder  configuration  is  used. 

Each  time  that  three  bits  are  added  together,  the  carry  to  the  next  higher 
binary  level  is  delayed  one  clock  period  and  the  other  input  to  the  next 
level  adder  will  also  be  delayed  one  clock  period  by  means  of  shift-register 


delays.  The  sum-bit  outputs  will  also  be  delayed  as  higher  binary  values  of 
the  output  number  are  generated  and  the  lower  values  will  also  have  to  have 
delays  inserted  in  order  that  the  output  bits  are  not  skewed. 


A list  of  the  number  of  cells  required  for  the  arithmetic  arrays  is 
given  in  Table  2. 

Table  2.  Cell  Count  for  Various  DCCL  Arrays 


Technology 

16  + 16 

32  + 32 

8x8 

16  x 16 

Regeneration  cells 

75 

343 

62 

89 

AND  gates 

-0- 

-0- 

64 

256 

Shift  registers 

360 

1488 

190 

1328 

Full-adders 

15 

31 

47 

214 

Half-adders 

1 

1 

11 

16 

2.2.6  Other  Digital  Technologies 

All  of  the  multiplier  arrays  described  by  A.  Habibi  and  P.  A.  Wintz*  were 
reviewed  and  the  cell  counts  did  not  vary  significantly  from  the  schemes  used 
for  the  DCCL  arrays.  Therefore  in  calculating  the  power  dissipation  for  var- 
ious technologies  we  have  assumed  the  same  mix  of  cells  that  is  listed  in 
Table  2. 

For  implementing  the  necessary  delays  in  CMOS,  P-MOS  and  N-MOS,  we  have 
assumed  that  shift-registers  are  used  and  that  in  the  I^L  array,  D-type  flip- 
flops  are  used. 

The  total  power  dissipation  for  various  size  arrays  and  technologies  at 
clock  frequencies  of  1MHz  and  lOMNz  are  listed  in  Tables  3 and  4.  These  power 
dissipations  are  calculated  for  the  specific  cell  mix  described  in  Table  2, 
and  will  vary  slightly  according  to  which  scheme  is  used  for  adding  the  summands. 


* A.  Habibi  and  P.  A.  Wintz,  "Fast  Multipliers",  IEEE  Transactions  on 
Computers,  ppl 53- 1 57 , February  1970 
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Table  3.  Total  Power  Dissipation  in  Watts  of  Various  Size 

Arrays  and  Technologies  at  a Clock  Frequency  of  1MHz 


Technology 

16  + 16 

32  + 32 

8x8 

16  x 16 

DCCL 

.009 

.024 

0.008 

0.044 

CMOS 

0.582 

2.3 

0.820 

4.1 

P-MOS 

2.9 

13.3 

2.5 

15.0 

N-MOS 

0.531 

2.3 

0.559 

3.1 

l2L 

0.040 

0.174 

0.036 

0.215 

Table  4. 

Total  Power  Dissipation  in  Watts  of 
Arrays  and  Technologies  at  a Clock 

Various  Size 

Frequency  of  10MHz 

Tecnnology 

16  + 16 

32  + 32 

8x8 

16  x 16 

DCCL 

.089 

0.237 

0.077 

0.444 

CMOS 

1.8 

6.8 

2.8 

13.8 

P-MOS 

4.6 

19.9 

4.9 

27.0 

N-MOS 

1.02 

4.3 

1.05 

4.9 

l2L 

0.596 

2.7 

0.544 

3.2 

2.2.7  Package  Density  Comparisons 

Digital  CCD  technology  is  an  inherently  high  density  technique  due  to  the 
fact  that  a DCCL  logic  function  is  implemented  by  processing  an  existing  packet 
of  charges  in  contrast  to  other  logic  families  in  which  a logic  function  is 
implemented  by  a digital  circuit  built  with  several  components.  In  addition, 
DCCL  layouts  have  four  layers  of  interconnection.  The  silicon  substrate  acts 
as  a ground  plane,  the  signal  flow  is  along  channels  at  or  below  the  silicon 
surface  and  forms  the  first  interconnect  layer.  The  two  levels  of  polysilicon 
that  arc  insulated  from  each  other  can  be  used  as  cross-overs  and  are  useful 
for  interconnecting  electrodes  within  logic  cells.  The  single  metal  layer  forms 
the  bus  lines  for  the  clock  phase  and  is  the  fourth  interconnection  layer. 
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The  area  of  various  DCCL  arithmetic  arrays  are  listed  in  Table  5. and  are 
obtained  from  designed  and  fabricated  chips.  They  are  for  the  active  circuit 
areas  and  do  not  include  room  for  input/output  buffers,  bonding  pads  or  borders. 

Table  5.  Estimates  for  the  Active  Area  in  mm2  of  various 
arithmetic  arrays  constructed  from  different 
semiconductor  technologies. 


Technology 

16  + 16 

32  + 32 

8x8 

16  x 16 

DCCL 

2.92 

8.94 

3.1 

28.0 

P-MOS 

11.3 

49.2 

12.2 

67.7 

N-MOS 

7.78 

34.7 

7.65 

44.2 

CMOS 

16.5 

70.2 

19.5 

I2L 

14.9 

64.9 

26.2 

137  | 

2.2.8  CMOS,  P-MOS  and  N-MOS  Arrays 

In  calculating  the  areas  of  the  various  MOS  arrays,  a minimum  geometry  of 
5ym  is  used.  The  MOS  gate  lengths  were  calculated  for  the  required  transcon- 
ductance using  the  5ym  minimum  geometry  for  the  MOS  gate  width.  Thus,  knowing 
the  gate  lengths  and  widths  and  using  an  alignment  tolerance  of  2ym,  the  areas 
of  the  logic  cells  could  be  computed. 

The  estimated  areas  of  various  CMOS,  P-MOS  and  N-MOS  arithmetic  arrays 
are  listed  in  Table  5.  The  areas  for  interconnecting  overhead  is  assumed  to 
be  100%. 

2.2.9  I2L  Arrays 

If  we  assume  a minimum  geometry  of  5ym  and  an  alignment  tolerance  of  2.4iim, 
then  the  layout  of  the  D-type  flip-flop  illustrated  in  S.  Bruederle's  paper* 
will  be  .021mm2.  A full-adder  laid  out  with  the  inverters  perpendicular  to  the 
pnp  emitter  in  the  same  way  as  the  D-type  flip-flop  will  measure  ,062mm2.  A 
half-adder  will  measure  .04mm2  and  an  AND-gate  will  measure  .003mm2. 

The  area  estimates  for  various  I2L  arithmetic  arrays  are  listed  in  Table 
5.  In  the  calculations  for  area,  an  interconnection  overhead  of  100%  is  used. 

*$tan  feruederTe  and  Philip  Smith,  "Designing  with  !2L",  19/2  Wescon,  Aug.  1975 
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3.  SYNTHESIS  OF  DCCL  DESIGN  EQUATIONS 


3.1  DIGITAL  GATES 

Digital  logic  can  be  implemented  with  a two  level  gate  process  such 
as  that  used  in  standard  analog  CCD's.  A logical  one  is  defined  as  a charge 
quantity  which  is  equal  to  the  capacity  of  a minimum  geometry  storage  elec- 
trode, and  a logical  zero  is  defined  as  an  empty  storage  electrode. 

The  logical  OR  function  is  the  easiest  function  to  implement.  The 
logical  OR  function  is  shown  in  Figure  3-1. 


Figure  3-1.  DCCL  OR  Gate 


When  a logical  one  is  transferred  from  either  the  A or  the  B input  under 
a common  storage  electrode  the  OR  function  occurs.  In  this  simple  OR  gate, 
the  common  storage  electrode  will  contain  a charge  quantity  which  is  twice  that 
of  a logical  one  when  both  A and  B are  ones.  This  condition  can  be  corrected 
by  providing  a potential  barrier  and  charge  sink  for  the  excess  charge  as  shown 
in  Figure  3-2. 


Figure  3-2.  DCCL  OR  Gate  with  Correction  for  1 + 1 Logic 

Realizing  that  the  charge  which  is  discarded  is  the  AND  function  of  A 
and  B,  it  is  a natural  extension  of  the  basic  OR  gate  to  form  an  AND  gate.  As 
shown  in  Figure  3-3,andaAND  function  Js  implemented  by  saving  the  charge  which 
spills  over  the  barrier  electrode  and  sinking  the  OR  function  on  an  alternate 
clock  phase. 


1 
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The  AND  gate  may  be  altered  to  perform  the  exclusive-OR  function.  The 
exclusive-OR  function  is  shown  in  Figure  3-4. 


In  the  exclusive-OR  implementation,  the  output  is  taken  from  the  OR 
function  output.  However,  the  output  is  corrected  for  the  (1+1)  state  by 
detecting  the  AND  output  with  either  a floating  gate  or  a floating  diffusion 
which  raises  the  surface  potential  of  the  transfer  gate  and  blocks  the  OR  out- 
put. Since  the  (1  + 1)  state  will  leave  a logical  one  charge  packet  under  the 
D electrode,  a charge  sink  must  be  provided  on  an  alternate  clock  phase.  If 
the  AND  function  is  not  used,  it  must  also  be  purged  with  a charge  sink. 

The  next  logical  extension  is  implemented  by  taking  the  charge  from  the 
D electrode  of  the  exclusive-OR  gate  to  an  output  instead  of  a charge  sink. 

The  result  is  a half-adder  which  is  shown  in  Figure  3-5. 

The  illustrated  half-adder  is  currently  being  used  as  one  of  the  funda- 
mental cells  in  large  arithmetic  arrays.  It  is  of  interest  to  note  the  out- 
puts which  result  when  the  A input  is  always  supplied  with  a one.  Under  this 
condition,  the  carry  output  will  be  logically  equal  to  B and  the  sum  output 
will  be  logically  equal  to  the  complement  of  B.  However  by  the  action  of  the 
circuit,  the  resulting  outputs  are  refreshed  to  a full  logical  one  level.  Hence, 
the  half-adder  may  be  used  to  perform  the  refresh  function  which  becomes  necess- 
ary to  prevent  signal  degradation  in  large  arrays. 


I. 


TRANSFER  GATE 


BfiSsi 


SUM  TRUTH  TABLE 


CARRY 


uu,ru'  BARRIER  AND  OUTPUT  SINK 


Figure  3-5.  DCCL  Half-Adder 


A full-adder  can  be  implemented  by  adding  a third  input  to  the  input 
AND  gate,  an  additional  barrier  and  storage  location  and  an  OR  gate. 


The  DCCL  full-adder  implementation  is  shown  in  Figure  3-6  along  with 
its  Truth  Table. 


TRANSFER 


BARRIER 


CHARGE  SENSING 
ELEMENT 


BARRIER 


Truth  Table 


ABC 


SUM  CARRY 


0 0 0 
0 0 1 
0 1 0 
1 0 0 
0 1 1 
1 1 0 
1 0 1 
1 1 1 


Figure  3-6.  DCCL  Full-Adder  and  Truth  Table 


3.2  DCCL  LOGIC  CELL  DESIGN 


DCCL  design  begins  with  the  selection  of  the  dimension  for  a minimum 
geometry  storage  element.  From  these  dimensions  the  dimension  of  other 
storage  elements  are  determined  in  accord  with  the  number  of  digital  charge 
packets  they  must  store.  Present  DCCL's  employ  a 0.2mil2  storage  element. 

However,  the  floating  gate  holding  well,  which  is  controlled  by  the 
floating  gate  slave,  must  be  designed  to  hold  and  transfer  a digital  one 
charge  packet.  A schematic  of  the  floating  gate  holding  well  with  a corres- 
ponding surface  potential  diagram  is  shown  in  Figure  3-7. 


FLOATING  GATE 
SLAVE  GATE 


Figure  3-7.  Floating  Gate  Slave  Holding  Well 
The  charge  capacity  of  the  floating  gate  holding  well  is  given  by, 

5 " AsCox  aVfg  (l> 

where  A$  is  the  area  of  the  floating  gate  holding  well,  CQX  is  the  oxide  ' 
capacitance  per  unit  area,  and  aV^  is  the  floating  gate  swing.  The  floating 
gate  swing  is, 

d vfn 

AVfn  = AQ  * (2) 

Tg  d Q 

d V, 

where  is  the  floating  gate  sensitivity  and  aQ  is  the  magnitude  of  a 

logical  one  charge  packet. 

Since  the  holding  well  must  hold  and  transfer  a logical  one  charge  packet, 
it  follows  that  Q = aQ  and, 

d V, 

AS  = cox> 


(3) 


This  equation  yields  the  minimum  required  holding  well  area.  To  pro- 
vide for  noise  immunity  it  is  reasonable  to  double  the  area  calculated. 

The  floating  gate  sensitivity  can  be  calculated  from. 


-air " 


i'.,  -W  O'-1*  -V 

% Cox  <Vf9  - VF6> 


ry  - 4>o  <vf(,  - »FB  lvf,  -,FB)J 

% co. w'fg-yi'-1’1  "fl 


where. 


VFG  = *ms  ‘ Qss/Cox 

VQ  - qNd  es/Cox2 

e = (Afg  Cox  + Cext)/Afg  Cox 

is  the  floating  gate  voltage 

is  the  area  of  the  floating  gate 

Vft)  is  the  flat-band  voltage 

q is  the  magnitude  of  the  electronic  charge 

Nd  is  the  substrate  donor  impurity  concentration 

e$  is  the  permittivity  of  silicon 

Cox  is  the  oxide  capacitance  per  unit  area 

Qss  Is  the  surface  charge  density 

is  the  gate/silicon  work  function  difference 
Cext  the  caPac'* tance  on  the  floating  gate 

Present  digital  CCL  designs  using  5ym  geometries  on  5ohm-cm  n type  sub- 
strates yield  typical  values  of  g = 1.6,  dV^/dQ  = 7.5  x 1012  vol ts/coulomb , 
and  a floating  gate  swing  of  1.2  volts.  Calculating  the  area  of  the  holding 
well  yields  As  = 0.6mils2  which  dictates  that  the  floating  gate  holding  well 
be  three  times  the  area  of  a logical  one  storage  element. 
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3.3  COMPARISON  BETWEEN  FULL-ADDER  AND  DUAL  HALF-ADDER  IMPLEMENTATIONS 


3.3.1  Full -Adder  Implementation 

A half-adder  accepts  two  inputs  a and  b,  and  produces  a sum  S = 1,  if 
either  input  is  1,  but  not  when  both  inputs  are  a 1.  The  carry  Ci  =1  if 

both  inputs  are  1.  Hence  S = a © b and  C = ab.  A full -adder  accepts  three 

inputs  and  produces  a sum  S = 1 when  one  or  all  three  inputs  arel , thus,  in 
logical  terms  S * g + (a  + b).  A carry  C * 1 is  produced,  when  two  or  three 

inputs  are  l's;  C = g (a  + b)  + ab.  Hence,  a full-adder  can  be  realized 

using  two  half-adders  plus  an  OR  gate  as  shown  in  3-8. 


Figure  3-8.  A Full-Adder  Logic  Cell  Implemented  with 
Dual  Cascaded  Half-Adders 

There  are  trade-offs  in  the  choice  of  full-adders  and  dual  half-adders 
that  affect  the  maximum  clock  frequency,  power  dissipation,  signal-to-noise 
ratio,  transfer  efficiency,  propagation  time  and  silicon  area. 

3.3.2  Clock  Frequency 

For  the  large  charge  packets  used  in  DCCL's,  the  transfer  of  charges  is 
dominated  by  the  self-induced  drift.  In  a half-adder  the  01  clock  is  applied 
to  the  "OR"  gate,  referred  to  as  the  D gate  in  Figure  3-6.  The  time  duration 
of  the  01  clock  is  determined  by  the  time  necessary  for  an  input  of  2Q0  charges 
to  fill  the  input  storage  area  D,  transfer  over  the  barrier  and  fill  the  stor- 
age area  X,  under  the  master  side  of  the  floating-gate  as  shown  in  Figure  3-6. 

The  time  required  for  the  initial  2Q0  charge  to  fall  until  the  surface 
potential  is  equal  to  the  thermal  voltage  KT/q  has  been  shown  to  be 

|3  W 

t = ” HA  Co_x (5) 

*HA  4y  Q„ 
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where  L is  the  total  length  of  the  electrodes  over  the  input  gate,  the  0 and 
X storage  gates  and  the  intermediate  transfer  gate.  W is  the  channel  width, 
C0x  is  the  oxide  capacitance  per  unit  area,  y , the  mobility  of  the  carriers 
and  (01  - 02)  = 2Q0/(L2WC0X)  is  the  initial  input  charge.  The  potential 
difference  01  - 02  is  the  difference  in  surface  potentials  of  a full  charge 
packet  Q0. 

At  the  end  of  the  self-induced  drift  period,  the  remaining  input  charge 
has  a surface  potential  of  26mV  at  room  temperature  and  is  swept  out  by  the 
fringing  fields. 


The  full-adder  has  an  additional  transfer  area  and  storage  gate  that  has 
to  fill  when  the  initial  input  charge  is  3Q0.  The  self-induced  drift  period 
for  the  full  adder  is 


w L3  Wr 
FA  Lox 


6u  Qo 


(6) 


The  ratio  in  self-induced  drift  time  between  a half-adder  and  a full- 
adder  is 


tFA  _ 2 , LFA  v3 

rHA  J LHA 


(7) 


For  the  specific  designs  described  here,  LH^  = 1.4  mil  and  Lpft  = 2.6  mil. 
The  01  period  for  the  full-adder  will  be  2.1  times  that  required  for  the  half- 
adder. 

The  clock  period  for  full-adders  and  half-adders  can  be  divided  into  two 
periods,  the  period  that  the  charges  are  equal izing  whi le  the  01  clock  is  neg- 
ative and  the  period  when  the  01  clock  is  positive  and  the  other  clocks  are 
controlling  the  charges.  In  a half-adder,  the  first  period  is  approximately 
40%  or  0.4t  and  the  second  period  is  60%  or  0.6t.  In  a full-adder  the  first 
period  is  2.1  x 0.4t  = 0.84t  so  that  the  total  time  for  a full  adder  is  1.44t, 
compared  to  It  for  a half-adder. 

3.3.3  Power  Dissipation 

The  power  consumed  in  a DCCL  is  only  that  power  required  to  charge  the 
gate  capacitance  to  each  clock  voltage.  The  capacitance  of  the  01  clock  line 
to  the  full-adder  is  approximately  1.8  times  that  of  a half-adder  and  except 


for  that  clock  line,  all  the  other  capacitances  are  identical.  This  additional 
capacitance  causes  a full-adder  to  dissipate  20%  more  power  than  a half-adder. 
However,  when  the  dual  half-adders  are  used  to  implement  a full-adder  function, 
the  configuration  requires  two  one-bit  shift-registers  and  an  OR  gate.  These 
additional  elements  added  to  the  two  half-adders  result  in  an  overall  power 
dissipation  that  is  2.5  times  that  of  a full-adder. 


3.3.4  Siqnal-to-Noise  Ratio 

A half-adder  requires  one  input  and  one  output  port  to  the  storage  area 
under  the  master  side  of  the  floating  gate.  The  spacing  between  the  polysilicon 
gates  covering  the  two  ports  is  5pm  which  results  in  a minimum  storage  area  of 
A = 52u2. 

The  additional  channel  to  the  intermediate  storage  area  in  a full -adder 
requires  that  a second  output  port  be  added  to  the  storage  area  under  the  master 
side  of  the  floating-gate.  The  polysilicon  spacing  required  by  this  additional 
output  channel  doubles  the  storage  area  to  A = 1 04p2 . 


Increasing  the  storage  area  results  in  an  increase  in  the  capacitive  drive 
of  the  floating-gate. 


1/P 


(8) 


However,  it  also  reduces  the  change  in  surface  potential  under  the  master 
side  of  the  floating-gate  as  shown  in  expression  (4).  The  net  result  is  a 
decrease  in  AVg  that  is  a nonlinear  function  of  the  area  A.  A decrease  in  AVg 
will  reduce  the  voltage  difference  between  the  slave  side  of  the  floating-gate 
acting  as  a transfer  gate  and  acting  as  a charge  barrier  gate.  The  reduction 
in  voltage  charge  may  allow  some  charge  to  spill  over  the  floating-gate  when 
it  is  in  the  barrier  mode.  Thus  an  increase  in  A results  in  decreasing  the  noise 
immunity. 

3.3.5  Transfer  Efficiency 

It  is  not  feasible  to  use  a "fat-zero"  in  implementing  arithmetic  functions 
with  DCCL's  so  that  typically  in  our  current  units  we  obtain  a transfer  effic- 
iency of  only  0.998.  There  are  two  transfers  through  a full-adder  resulting  in 
a transfer  efficiency  of  0.996.  In  a dual  half-adder  configuration  there  are 
four  transfers  producing  a transfer  efficiency  of  0.992.  In  the  layout  of  a 
large  pipeline  arithmetic  array  it  will  therefore  be  necessary  to  insert  a level 
of  charge  refresh  cells  twice  as  frequently  when  dual  half-adder  configurations 
are  used. 
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4.  IMPLEMENTATION  OF  PIPELINE  ARITHMETIC  ARRAYS 


In  this  section  we  describe  the  design  of  the  arithmetic  arrays  imple- 
mented on  the  DP 2 and  DP3  chips.  Both  chips  utilize  surface  p-channel  two- 
phase  CCD  technology.  They  differ  in  geometry  and  processing,  the  DP2 
utilizing  7.6pm  minimum  geometry  and  a metal-polysilicon  structure  whereas 
in  the  DP3,  the  geometry  is  reduced  to  a minimum  of  5.1pm  and  a double 
polysilicon  gate  structure  is  used. 

4.1  THE  DP2,  4-BIT  + 4-BIT  ADDER  ARRAY 

The  DP2  4 + 4 adder  array  uses  the  dual  cascaded  half-adders  to  perform 
the  arithmetic. 


The  addition  of  two  4-bit  binary  numbers  aQ,  a1 , a^,  a3,  and  bQ,  b^ , b^, 
b3  in  which  aQ  and  bQ  are  the  least  significant  bits,  is  performed  with  DCCL 
in  a straightforward  manner. 


First  Word  a3  a 2 a1  Sq 

Second  Word  b3  b?  b-j  bp 

Carry  Bits  c^  c3  c2  Cj 

SUM  S4  S3  S2  S]  SQ 

(Carry  bit  cp  is  generated  by  column  n-1.) 

A block  diagram  of  the  DP2,  4+4  adder  array  is  shown  in  Figure  4-1  and 
a ohotograph  of  a processed  array  is  shown  in  Figure  4-2.  The  4+4  adder 
utilizes  seven  half-adders,  three  OR-gates,  fifteen  single-bit  shift-registers 
and  five  output  buffers.  There  is  a propagation  delay  of  four  clock  phases 
through  the  4+4  array.  The  results  of  tests  carried  out  on  the  4+4  array 
are  described  in  Section  5.1. 

4.2  THE  DP2,  8-BIT  + 8-BIT  (DHA)  ADDER  ARRAY 

The  DP2  contains  two  8 + 8 bit  adder  arrays.  In  the  first  array  the 
arithmetic  is  performed  with  cascaded  dual  half-adders  (DHA)  and  in  the  other, 
the  arithmetic  is  performed  with  full-adders  (FA). 


The  8+8  (DHA)  adder  array  is  an  extension  of  the  4+4.  The  addition 
of  the  two  8-bit  binary  numbers  aQ  - a^  and  bp  - b^  is  performed  with  DCCL  in 
the  same  manner  as  the  4+4  array. 
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A block  diagram  of  the  DP2,  8+8  (DHA)  adder  array  is  shown  in  Figure  4-3. 
The  array  utilizes  fifteen  half-adders,  seven  OR  gates,  seventy-seven  single- 
bit shift-registers  and  eight  output  buffers.  There  is  a propagation  delay  of 
eight  clock  phases  through  the  8 + 8 (DHA)  adder.  A photograph  of  a processed 
array  is  shown  in  Figure  4-4,  and  the  results  of  tests  carried  out  on  the  8+8 
(DHA)  array  are  described  in  Section  5.2. 

4.3  THE  DP2,  8-BIT  + 8-BIT  (FA)  ADDER  ARRAY 

The  8+8  (FA)  adder  array  performs  the  addition  of  two  8-bit  numbers  in 
the  same  manner  as  the  8+8  (DHA)  described  in  Section  4.2.  A block  diagram 
of  the  DP2,  8+8  (FA)  adder  array  is  shown  in  Figure  4-5.  This  array  utilizes 
one  half-adder,  seven  full-adders,  eighty-four  single-bit  shift  registers  and 
eight  output  buffers.  There  is  a propagation  delay  of  eight  clock  phases  through 
the  8 + 8 (FA)  adder.  A photograph  of  a processed  array  is  shown  in  Figure  4-6. 

4.4  THE  DP3,  16-BIT  + 16-BIT  ADDER  ARRAY 

The  16+16  adder  array  performs  the  addition  of  two  16-bit  binary  numbers 
in  the  same  pipeline  manner  described  for  the  8+8  (DHA)  adder  array  described 
in  Section  4.2.  A block  diagram  of  the  16  + 16  adder  array  is  shown  in  Figure 
4-7.  The  full-adder  cells  used  in  the  16  + 16  array  are  composed  of  dual  cas- 
caded half-adders  and  an  OR  gate  as  described  in  Section  5.1.  A change  was  made 
to  the  basic  design  by  utilizing  a full  adder  for  the  least  significant  sum 
rather  than  a half-adder.  The  additional  input  allows  us  to  cascade  arrays  up 
to  any  number  of  bit  length  words  by  using  the  "carry-in"  feature. 

The  packets  of  charge  propagating  through  the  array  undergo  seventeen  trans- 
fers. Due  to  the  low  transfer  efficiency  obtained  through  the  DP2  arithmetic 
arrays,  we  decided  to  insert  two  levels  of  charge  refresh  cells  in  the  16  + 16 
array. 

The  first  level  of  charge  refresh  was  inserted  after  nine  or  ten  transfers 
and  the  second  level  after  the  seventeen  transfer,  immediately  before  the  charge 
packet  is  transferred  to  a voltage  signal  by  the  output  buffer.  The  16  + 16 
array  utilizes  thirty- two  half-adders,  sixteen  OR  gates,  forty  charge  refresh 
cells,  three  hundred  and  forty-one  single  stage  shift-register  delays  and 
seventeen  output  buffers.  The  design  of  the  full-adders  is  discussed  in  Section 
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HALF-ADDER  AND  SEVEN  Fill  1 -ADDERS 


5.1,  the  charge  refresh  cell  in  Section  5.9  and  the  output  buffer  in  Section 
5.11.  There  is  a propagation  delay  of  seventeen  clock  phases  through  the  16 
+ 16  adder  array.  A photograph  of  the  processed  array  is  shown  in  Figure  4-8, 
and  the  results  of  tests  carried  out  on  the  16  + 16  array  are  described  in 
Section  5.3. 

The  two  large  square  areas  shown  in  Figure  4-8,  near  the  adder  array, 
are  two  polysilicon/silicon  dioxide  capacitors  that  are  used  to  evaluate  to 
C-V  characteristics  during  semiconductor  processing  evaluation 

4.5  THE  DP2,  3-BIT  X 3-BIT  MULTIPLIER  ARRAY 

The  operations  required  to  multiply  two  3-bit  binary  numbers  are  performed 
in  the  following  manner. 


The  nine  summands  must  be  formed  with  logic  AND  gates  and  then  added  by 

columns  (with  carries)  to  form  the  product  p&  P2.  P^ . Note:  for  an 

example,  that  if  a]  = a2  = a3  = b1  = b2  = 1 , then  one  carry  is  produced  in  the 
generation  of  p2  and  two  carries  are  produced  in  the  generation  of  p^. 

The  block  diagram  of  the  DP2  3x3  multiplier  array  is  given  in  Figure  4-9 
which  shows  that  the  generation  of  p^  only  requires  an  AND  gate;  the  generation 
of  p2  requires  a half-adder,  but  the  generation  of  p3  and  p4,  both  require  three 
half-adders.  The  3x3  array  utilizes  nine  AND  gates,  nine  half-adders,  three 
OR  gates,  seventeen  1-bit  shift-registers  and  six  output  buffers. 

There  Is  a propagation  delay  of  four  clock  phases  through  the  3x3  multi- 
plier. A photograph  of  the  DP2,  3x3  multiplier  is  shown  in  Figure  4-10  and 
the  test  results  are  discussed  in  Section  5.4. 

4.6  THE  DP3,  8-BIT  X 8-BIT  MULTIPLIER  ARRAY 

The  8x8  multiplier  requires  many  more  operations  than  the  3 x 3 as  shown 

in  Table  6. 
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TABLE  6.  16  X 16  ARITHMETIC 


The  64  summands  are  formed  by  6 4 ANDgates  and  then  added  by  columns 

(with  carries)  to  form  the  products  p16  P2,  p^ . Note:  for  example, 

that  if  all  inputs  are  a binary  "1"  then  two  carries  are  formed  in  the  gen- 
eration of  p3  so  that  a total  of  six  entries  must  be  added  to  generate  p4. 
This  increase  in  entries  continues  until  we  reach  pg;  this  column  has  seven 
sunmands  plus  four  carries  resulting  in  a total  of  eleven  entries.  Thus, 
column  Pg  requires  five  cascaded  half-adders  or  ten  half-adders  for  imple- 
mentation. 

A block  diagram  of  the  DP3,  8x8  multiplier  is  shown  in  Figure  4-11. 

This  array  contains  64  AND  gates.  111  half-adders,  48  OR  gates,  154  charge 

refresh  cells,  466  single-bit  shift-registers  and  16  output  buffers. 

\ 

There  is  a propagation  delay  of  32  clock-phases  through  the  8x8 
multiplier.  A photograph  of  the  DP3,  8x8  multiplier  is  shown  in  Figure 
4-12  and  the  test  results  are  discussed  in  Section  5.5. 
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5.  FUNCTIONAL  TESTING  OF  ARITHMETIC  ARRAYS 
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5.1  TESTING  THE  DP2,  4+4  ARRAY 

The  4+4  array  processes  the  data  in  parallel,  thus  the  two  4-bit 
numbers  are  applied  synchronously  and  the  outputs  are  also  available  synch- 
ronously. The  design  of  the  DP2,  4+4  array  is  described  in  Section  4.1. 

The  test  procedure  described  in  this  section  was  carried  out  on  a wafer, 
using  a probe  station. 

The  gate  voltage  versus  surface  potentual  (Vg/0s)  plots  were  made  of 
several  chips  on  each  wafer  processed.  A typical  Vg/0<-  plot  of  a p-channel 
DP2  wafer  is  shown  in  Figure  5-1  and  a block  diagram  of  a single  DP2  half- 
adder cell  is  shown  in  Figure  5-2.  A surface  potential  diagram  of  the  half- 
adder sum  and  carry  channels  was  derived  as  shown  in  Figure  5-3.  The  gates 
referenced  in  Figure  5-3  correspond  to  the  gates  shown  in  Figure  5-2. 

The  set  of  clock  waveforms  shown  in  Figure  5-4  are  necessary  to  produce 
the  required  surface  potentials  in  the  correct  phase  and  were  derived  from 
the  VQ/0S  plots  and  applied  to  the  logic  cell  under  test. 

The  eight  input  lines  were  exercised  through  all  possible  sixteen  com- 
binations and  the  output  from  the  array  was  observed  on  a CRT.  It  was  seen 
that  the  five  output  lines  produced  the  correct  output  data  for  each  input 
combination. 

CRT  photographs  of  the  4+4  sum  outputs  for  various  input  combinations 
are  shown  in  Figures  5-5,  5-6,  5-7,  and  5-8. 

5.2  FUNCTIONAL  TESTING  OF  THE  DP2,  8+8  ARRAYS 

There  are  two  8+8  adder  arrays  on  the  DP2  chip,  one  design  utilizes 
dual  half-adders  and  is  described  in  Section  4.2.  The  other  design  is  made 
up  of  full-adders  and  is  described  in  Section  4.3. 

Several  mask  errors  were  made  on  the  full -adder  version  which  prevented 
it  from  functioning;  because  we  planned  on  using  the  dual  half-adder  concept 
Oh  DP3  layout  and  since  the  DP3  mask  set  was  soon  to  be  completed,  we 
decided  not  to  procure  a corrected  version  of  the  0P2  mask  set. 

The  dual  h ,f-adder  version  of  the  8+8  adder  arrays  was  connected  up 
with  the  same  clock  voltages  derived  for  the  4+4  array,  as  shown  in  Figure 
5-4. 
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Figure  5-5.  Input  to  2-Word  4-Bit  Adder 
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Figure  5-6.  Input  to  2-Word  4-Bit  Adder 
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All  nine  output  channels  responded  correctly  to  the  input  signals  as 
shown  in  the  oscilloscope  photographs  in  Figures  5-9,  5-10,  5-11.  5-12,  and 
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Figure  5-11 
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Figure  5-13. 
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In  these  figures,  and  tK  are  the  input  bits,  where  ao  and  bQ  represent 
the  least  significant  bits  and  s^  represents  the  sum. 


5.3  TESTING  OF  THE  DP3,  16  + 16  ARRAY 

In  an  effort  to  reduce  the  number  of  individual  clock  lines  required  by 
the  DP2  designs,  we  studied  the  problem  and  carried  out  empirical  testing  on 
single  half-adder  test  cells.  As  a result,  we  determined  that  the  dual  half- 
adder would  function  correctly  with  only  five  clock  lines  plus  the  inject 
diode  and  the  0R  refresh  clock. 

The  DP3,  16  + 16  adder  array  design  was  based  on  this  concept.  Unfortun- 
ately we  experienced  a race  condition  that  prevented  us  from  being  able  to 
test  the  16  + 16  array.  This  problem  is  described  in  some  detail  in  Section 
6.1. 

5.4  TESTING  OF  THE  DP2,  3X3  ARRAY 

The  3-bit  x 3-bit  multiplier  array  was  designed  with  the  same  dual-half 
adder  and  single  half-adder  cells  used  in  the  4+4  and  8+8  arrays.  The  same 
clock  waveforms  shown  in  Figure  5-4  were  applied  to  the  3x3  multiplier  array 
and  it  performed  the  correct  operation  for  all  fifteen  input  states. 
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Oscilloscope  photographs  of  the  output  for  several  different  input 
combinations  are  shown  in  Figures  5-14,  5-15,  5-16,  5-17,  5-18,  and  5-19. 
In  these  figures,  a^  and  bi  represent  the  input  bits  and  p^  represents  the 
product  output. 


Input  to  the  2-Word,  3-Rit  Multiplier 

ai  ■ 1 1 1 X 

bi  = 1 0 0 j 1 .5  V/cm 

Pi  = 011100 


Figure  5-15 


5.5  FUNCTIONAL  TESTING  OF  THE  DP3,  8X8  ARRAY 

The  design  of  the  8 x 8 array  has  the  same  logic  cells  as  the  16  + 16 
array  described  in  paragraph  5.3.  It  therefore  has  the  identical  race  con- 
dition as  the  16  + 16  array  and  has  so  far  proven  to  be  untestable. 


6.  FUNCTIONAL  TESTING  OF  DP3A  ARITHMETIC  CELLS 


Wafers  were  processed  from  the  DP3  masks  and  various  logic  cells  were 
tested.  It  was  soon  found  that  there  were  many  layout  errors,  it  was  also 
found  that  some  of  the  test  cells  were  too  complex  to  characterize  and 
analyze  easily. 

It  was  therefore  decided  to  correct  the  masks  and  simplify  some  of  the 
test  cells.  The  new  mask  set  was  designated  DP3A. 

Both  16+16  and  8x8  arithmetic  arrays  on  the  DP3A  were  designed  from 
interconnected  full-adders  formed  from  dual  cascaded  floating-gate  half-adders. 
In  order  to  check  the  correct  design  of  this  basic  array  cell  a single  full- 
adder  of  the  identical  design  used  in  the  array  was  placed  on  the  DP3A  mask 
set  as  a test-cel  1 . 

In  addition  to  the  dual -cascaded  floating-gate  half-adder,  layouts  of 
design  variations  of  half-adders  and  full-adders  were  also  included  as  test 
cells  on  the  DP3A  layout. 

6.1  TESTING  THE  FLOATING-DIFFUSION  HALF-ADDER 

A single  half-adder  with  a floating-diffusion  charge  sensing  switch  was 
included  in  the  DP3A  design.  A block  diagram  of  this  half-adder  test  cell  is 
shown  in  Figure  6-1. 

In  Figure  6-1,  the  double  lines  ending  in  an  arrow  represent  a channel 
path  and  charge  transfer  direction.  A single  line  represents  a clock  line  or 
metal  signal  path.  The  large  areas  at  the  end  of  a channel  arrow  represent 
charge  storage  gates  and  the  narrow  rectangles  with  a channel  crossing  through 
hem  represent  transfer  gates.  The  cross  hatched  areas  represent  diffused 
areas.  This  symbolic  representation  of  a DCCL  function  is  used  throughout 
this  section. 

A typical  Vg/0s  plot  of  a DP3A  wafer  is  shown  in  Figure  6-2;  from  these 
curves  the  potentials  of  the  waveforms  shown  in  Figure  6-3  were  derived. 

The  clock  and  data  waveforms  shown  in  Figure  6-3  were  used  to  test  the 
half-adder,  either  on  the  wafer  or  as  a packaged  unit.  Figures  6-4  and  6-5 
show  the  correct  operation  of  the  half-adder  tested  on  the  wafer  with  a lOOKHz 
clock  frequency. 
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DATA 
IN 

SOURCE 
DIODE 

01  CLOCK 

02  CLOCK 

03  CLOCK 

04  CLOCK 

05  CLOCK 

0 RE-SET 

FIGURE  6-1.  WAVEFORMS  ASSOCIATEP  WITH  THE  FLOATING-PI FFUSION 

HALF-APDER  TEST  CELL 


Packaged  half-adders  were  functionally  tested  over  a wide  temperature 
range  inorder  to  determine  their  minimum  and  maximum  operating  t reguriu  ies . 
During  the  testing  it.  was  determined  that  the  halt -adder  pen-formed  corroitl\ 
with  a clock  frequency  of  6.6MH/  at  a temperature  ot  1!>.  ' T as  shown  in  I inure 
6-6. 

A curve  of  the  operational  range  of  tin'  halt  adder  as  .1  turn  t ion  of  the 
temperature  is  shown  in  I inure  6-/.  However  it  must  he  pointed  out  that  both 
the  low  frequency  of  1 OKU  7 and  the  high  frequeiu  \ ot  6.6MM/  are  limited  In  the 
design  of  the  pulse  generator  used  in  the  testing  (and  the  slow  rate  of  the 
MOS  output  circuit  in  6.6MH.*  lase). 


figure  6-4.  functional  demonstration  ot  t loot ing-dit fusion 
half-adder  with  an  input  ot  A 1,6  1 at  a 

c lock  rate  ot  lOflklt.- . 


1 


A - in 
B - in 


Sunt  Out 


Carry  Out 


Figure  6-5.  Functional  demonstration  of  floating-diffusion  half- 
adder with  an  input  of  A 1,  B = 0 at  a clock  rate 
of  lOOKHz. 


Sum 


Carry 


Figure  6-6.  Functional  demonstration  of  floating-diffusion  half- 
adder with  an  input  of  A 10101111  and  B 01011111 
at  a clock  rate  of  6.5MHz.  The  output  is  slow  rate 
limited  by  the  final  MOS  circuit. 
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OPERATING  TEMPERATURE  IN  DEGREES  CENTIGRADE 


FIG  6-7.  THE  OPERATIONAL  FREQUENCY  RANGE  OF  A 

PACKAGED  FLOATING-DIFFUSION  HALF-ADDER  AS  A 
FUNCTION  OF  TEMPERATURE 


6.2  TESTING  THE  FLOATING-GATE  CASCADED  DUAL  HALF-ADDERS 

Both  the  16  + 16  and  8x8  arrays  designed  on  the  DP3A  chip  used  the 
floating-gate  cascaded  dual  half-adder  to  implement  the  full-adder  arithmetic 
function.  A block  diagram  of  this  arithmetic  cell  is  shown  in  Figure  6-8  and 
an  isolated  sample  of  the  cell  is  included  on  the  DP3A  for  evaluation  purposes. 

The  dual  half-adder  was  made  to  perform  the  correct  arithmetic  function, 
however  the  transfer  efficiency  was  poor  and  the  amplitude  of  the  output  charge 
packets  were  small.  Consequently  we  felt  that  it  would  not  be  useful  to  pursue 
array  operation  since  an  array  uses  several  cascaded  arithmetic  cells.  The 
major  difficulty  in  operation  was  found  to  be  due  to  tying  together  on  the 
metal  interconnection  pattern,  a sink-diode  gate  and  the  carry-out  gate  to  the 
same  04  clock  line  (as  shown  in  Figure  6-8).  This  design  approach  was  taken  in 
an  effort  to  reduce  the  number  of  separate  clocks  that  are  required  in  an  array. 
A test  of  this  arrangement  had  been  made  on  the  DP2  mask  set  previously  with 
satisfactory  results. 

The  failure  occurs  when  two  or  more  inputs  to  the  adder  are  each  a binary 
one.  In  this  case,  the  input  storage  agea  overflows  and  fills  the  area  under 
the  master-end  of  the  floating-gate,  correctly  causing  the  surface  potential 
on  the  slave-end  of  the  floating-gate  to  switch  from  a transfer  to  a barrier 
level.  This  barrier  prevents  the  output  charge  packet  from  transferring  out 
of  the  sum  port.  During  the  next  clock  phase  when  04  switches  to  its  negative 
level  the  charge  retained  by  the  slave-end  of  the  floating  gate  should  transfer 
out  under  the  04  carry  out  gate.  However,  the  charge  under  the  master-side  of 
the  floating-gate  is  removed  by  a gate  and  diode  also  tied  to  the  same  04  clock. 
This  provides  a race  condition,  but  what  is  worse,  the  gate  and  diode  combina- 
tion move  the  surface  potential  under  the  floating  gate  to  a value  that  is  a 
function  of  the  04  voltage.  This  modified  surface  potential  switches  the  slave 
side  of  the  floating-gate  back  to  a transfer  mode  and  the  output  charge  in- 
correctly transfers  out  the  sum  port. 

Two  changes  were  made  to  the  clock  phases;  first,  the  timing  of  03  clock 
was  adjusted  so  that  it  switched  to  its  less  negative  potential  before  04 
switched  to  its  most  negative  level.  This  caused  the  sum  output  gate  to  act 
as  a barrier.  The  output  charge  was  then  retained  under  the  slave-end  of  the 
floating-gate. 
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FIGURE  6-8.  BLOCK  DIAGRAM  OF  A DP3  FLOATING-GATE,  DUAL  HALF-ADDER  DCCL  CELL. 


Since  the  04  sink  gate  to  the  master-end  of  the  floating-gate  is  a 


second  level  polysilicon  gate  and  the  04  carry-out  gate  is  a first  level 
polysilicon  gate,  there  is  a 2 to  3 volt  surface  potential  difference  be- 
tween them.  A step  voltage  was  applied  to  the  02  clock  so  that  the  stor- 
age gate  which  is  located  between  the  slave-end  of  the  floating-gate  (at 
the  poly  2 surface  potential)  and  the  carry-out  gate  (at  the  poly  1 sur- 
face potential)  would  be  at  a surface  potential  midway  between  the  two  04 
surface  potentials,  as  shown  in  Figure  6-9.  This  technique  pro  ed  satis- 
factory. The  arithmetic  functions  were  correct.  However,  the  amplitude 
of  the  outputs  were  greatly  diminished. 


Figure  6-9.  Gate  structure  and  surface  potentials  of  a DP3A  half- 
adder showing  the  conditions  that  have  to  exist  in  order 
that  the  charge  packet  under  the  slave  side  of  the 
floating-gate  will  transfer  out  the  carry  port. 


6.3  TESTING  THE  FLOATING-DIFFUSION  CASCADED  DUAL  HALF-ADDER 

Up  until  the  DP3  mask  set,  we  had  always  used  a floating-gate  sensing 
device.  However,  it  appeared  that  floating-diffusion  would  provide  a sen- 
sor that  was  more  sentitive,  require  less  area  and  operate  faster  than  the 
floating-gate  sensor.  Therefore  the  layout  of  a dual  half-adder  was  modi- 
fied from  floating-gate  to  floating-diffusion  and  included  on  the  DP3  chip 
as  an  evaluation  device.  Later,  when  the  DP3  was  corrected  and  became  the 
DP3A  mask  set,  some  of  the  cells  were  modified;  the  floating-diffusion  dual 
half-adder  was  modified  to  the  single  half-adder  described  above  in  Section 
6-1 . 

A block  diagram  of  the  floating-diffusion  cascaded  dual  half-adder  is 
shown  in  Figure  6-10  and  the  associated  waveforms  used  during  testing  are 
shown  in  Figure  6-11 . 


INJECT 


Figure  6-10.  Block  Diagram  of  a DP3  Floating-Diffusion  Cascaded 
Dual  Half-Adder  DCCL  Cell. 


Pi  •=  0 10  10  0 
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The  dual  half-adder  test  cell  operated  correctly  as  shown  in  Figure  6-13 
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Figure  6-12.  Functional  demonstration  of  the  floating-diffusion 
cascaded  dual  half-adders  at  a clock  frequency  of 
20KHz  with  inputs  of  A = 01001011,  B = 00101101  and 
G = 00010111. 

6.4  TESTING  THE  FLOATING-GATE  FULL-ADDER 

In  a full-adder  (3-inputs),  when  all  inputs  are  at  a binary  "1",  the 
three  input  charges  have  to  fill  three  serial  charge  buckets;  whereas,  in 
a half-adder  (2-inputs)  the  two  inputs  charges  have  to  fill  two  serial  charge 
buckets.  Thus  the  full -adder  will  always  be  slower  than  the  half-adder. 
However,  the  full-adder  requires  only  one  third  of  the  area  required  by  the 
cascaded  dual  half-adders  and  the  two  one-bit  delay  lines  required  to  per- 
form the  same  full-adder  function. 

With  this  in  mind  for  future  large  arrays,  we  incorporated  a floating- 
gate  full-adder  test  cell  on  the  DP3A  design. 


The  block  diagram  of  the  floating-gate  full-adder  is  shown  in  Figure 
6-13  and  the  waveforms  associated  with  testing  the  cell  are  shown  in  Figure 


Figure  6-15.  Functional  demonstration  of  the  floating-gate  full- 


adder  test  cell  at  a clock  frequency  of  20KHz  and 
inputs  of  A = 11110011,  B = 11000011,  and  G = 11000000. 


I 

6.5  TESTING  THE  FLOATING-DIFFUSION  FULL-ADDER 

A variation  of  the  full-adder  test  cell  was  included  on  the  DP3A  design. 
This  variation  has  a floating-diffusion  charge  sensing  switch  instead  of  the 
floating-gate.  A block  diagram  of  the  floating-diffusion  full-adder  is  shown 
in  Figure  6-16  and  the  clock  waveforms  associated  with  testing  it  are  shown  in 
Figure  6-17.  The  full-adder  operated  satisfactorily  as  shown  in  Figure  6-18. 

I ' 
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Figure  6-18.  Functional  demonstration  of  the  floating-diffusion 
full-adder  test  cell  at  a clock  frequency  of  20KHz 
with  inputs  of  A = 11110011,  B = 11000011  and 
G = 11000000. 

6.6  TESTING  DP3  BURIED  CHANNEL  DESIGNS 

Two  test  cells  on  the  DP3  received  the  buried  channel  implant;  one  logic 
circuit  and  a 10-bit  shift  register.  Unfortunately,  the  logic  circuit  cannot 
be  tested  due  to  a contact  mask  error  which  precludes  its  operation.  However, 
the  10-bit  shift  register  has  been  tested  with  favorable  results.  The  length 

of  the  register  makes  transfer  efficiency  comparisons  difficult. 

« 

In  p-surface  channel  operation,  the  surface  potential  is  less  negative 
than  the  gate  potential,  i.e., 

h * VQss/co*  - Vo  * <V  - «l,sVcox  - 2VoVl/2 

where 

V = qNdes/C  2 
o ^ ox 

However,  in  p-channel  channel  operation,  the  channel  potential  is  more 
negative  than  the  gate  potential,  i.e., 

»c  : vg  * Vcox  - 0<na  - V t2/2cS  - «<"»  -V  t/co« 

where  t is  the  thickness  of  the  implanted  channel. 
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In  the  following  equation,  the  last  term  dominates,  i.e., 

»c  s vg  - "(na  - V t/cox 

In  both  surface  and  buried  channel  operation,  the  surface  (or  channel) 
potential  is  displaced  from  the  gate  voltage  by  a quantity  which  is  inversely 
proportional  to  CQx.  Thus,  in  surface  channel  operation, one  must  make  a poly 
2 gate  more  negative  than  a poly  1 gate  to  equate  surface  potentials. 

However,  in  buried  channel  operation,  one  must  make  a poly  1 gate  more 
negative  than  a poly  2 gate  to  equate  channel  potentials. 

Based  on  this,  one  may  devise  an  experiment  to  verify  buried  channel 
operation.  In  this  experiment,  which  is  summarized  in  Figures  6-19  and  6-29, 
a small  amplitude,  4-phase  clocking  scheme  is  used.  The  poly  1 clock  phases 
are  offset  such  that  they  never  go  more  positive  than  the  most  negative 
portion  of  the  poly  2 clock  phases.  Thus,  surface  channel  operation  is  imposs- 
ible. Based  upon  this,  one  may  conclude  that  the  output  shown  in  Figure  6-20 
is  buried  channel  operation. 
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Figure  6-20.  Potential  Diagram  of  the  Buried 

Channel  Shift  Register 


7.  SEMICONDUCTOR  PROCESSING 


7.1  INTRODUCTION 

The  CCD  Processing  Laboratory  has  placed  great  emphasis  on  process 
standardization.  This  includes  all  major  processing  steps  such  as  field 
oxide  and  gate  oxide  growth,  polysilicon  film  deposition  and  gate  configur- 
ation delineation,  standardized  phosphorous  and  boron  implantation  methods, 
TEOS/silox  deposition  and  densification  cycles,  metallization  methods,  elim- 
ination of  wafer  material  defects  by  means  of  mechanical  abrasion  and  heavy 
phosphorous  (N+)  gettering  of  the  underside  of  each  wafer,  and  a variety  of 
rapid  in-process  electrical  checks  (e.g.:  C-V  measurements;  B-T  measure- 
ments; C^in/Cmax  plots  for  impurity  concentration  determination,  etc.). 

Process  standardization  has  also  been  achieved  by  automation  of 
critical  processing  steps;  this  includes  automated  field  and  gate  oxide 
furnaces;  automated  wafer  coating  and  developing  equipment;  etc.  In  general, 
a considerable  effort  has  been  made  to  eliminate  all  unnecessary  processing 
steps.  This  processing  philosophy  has  significantly  increased  LSI  device 
yields,  while  shortening  wafer  lot  turn-around  time. 

Current  CCD  circuit  layouts,  which  include  resultant  topologies,  are 
compatible  with  both  in-house  photolithographic  capabilities  and  CCD  LSI 
fabrication  technology.  Complex  CCD  LSI's  are  produced  with  eight  or  more 
mask  levels,  that  permit  5 to  7 micron  circuit  geometries.  Gate  densities 
are  limited  by  current  wet  chemistry  etching  and  dry  plasma  etching  processes; 
these  processes  are  used  to  define  polysilicon  gate  patterns  that  limit  over- 
all CCD  circuit  densities.  Digital  CCD's  are  currently  fabricated  with  two 

O 

polysilicon  gate  patterns,  1000A  thermally  grown  silicon  dioxide  (Si O2 ) gate 

o 

dielectrics,  and  10,000A  thick  E-beam  deposited  aluminum  interconnection 
patterns. 

7.2  DCCD  PROCESS  EVOLUTION 

Fabrication  processes  for  the  DP  series  of  devices  evolved  through  a 
number  of  mask  layout  iterations;  these  were  related  to  circuit  geometry 
changes  that  improved  major  operating  parameters  such  as  dynamic  range  and 
transfer  efficiency.  Each  new  generation  of  circuits  necessitated  new  mask 
sets  with  corresponding  process  sequence  changes,  primarily  connected  with 
gate  technology  variations. 
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Table  7 lists  the  four  major  DP  generations,  including  significant 
differences  between  each  generation. 

Table  7.  Six  Process  Modifications:  DP-0,  DP-1,  DP-2,  DP-3 


Ci rcui t 

Mask  Type 

Gate  Technology 

Design  Rules  in  Mils 

DP-0 

SI  i p 

Metal /Poly  Standard 

0.3 

DP-1 

Slip 

Metal /Poly  wi th  Poly 
Protect  Mask 

0.45 

DP-2 

Slip  and 
Conventional 

Poly/Poly 

0.3 

DP-3 

Conventional 

Poly/Poly 

0.3 

— 

- 

7.3  MASK  GENERATIONS 

"Slip"  masks  were  selected  for  initial  mask  generations.  This  approach 
provided  three  or  four  mask  levels  per  plate;  during  the  alignment  of  successive 
mask  levels,  the  "slip"  mask  was  displaced  in  either  the  X or  Y direction. 
Combining  four  mask  levels  onto  one  plate  reduced  mask  fabrication  costs  and 
delivery  schedules.  The  major  problem  encountered  with  slip  masks  involved 
alignment  difficulties  caused  by  considerable  "run-out"  and  "rotation"  between 
successive  mask  levels.  Normal  lens  aberrations  did  not  permit  the  mask  manu- 
facturer to  compensate  for  dimensional  tolerance  variations  between  mask  levels. 
These  masks  were  not  produced  by  TRW's  in-house  mask-making  facility,  which  was 
not  in  existence  during  the  initial  phase  of  this  program.  While  slip  masks 
were  used  for  DP-0,  DP-1  and  DP-2  mask  generations,  fabrication  difficulties 
combined  with  low  yields  necessitated  the  abandonment  of  this  approach  in  favor 
of  conventional  mask  sets,  which  provided  superior  results.  Conventional  mask 
sets  were  therefore  used  for  later  versions  of  DP-2  and  DP-3  series. 

7.4  GATE  TECHNOLOGY 

Two  competing  approaches  were  evaluated  to  determine  the  most  reliable  and 
reproducible  method  of  fabricating  CCD  gate  strucutres.  Two  basic  processes 
were  considered  that  affected  the  gate  material  and  the  Level  II  gate  structures. 
Aluminum  was  used  for  the  Level  II  gates  for  DP-0  and  DP-1.  Poor  step  coverage 
by  the  metal  led  to  the  use  of  polysilicon,  which  provided  a significant  improve- 
ment in  step  coverage  over  that  provided  by  aluminum.  Polysilicon  Level  II  gates 
were  used  for  DP-2  and  DP-3  circuits  with  success. 
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7.5  PROCESS  MODIFICATIONS 


Significant  oxide  undercut  occurred  during  fabrication  of  initial  DP-1 
lots  using  the  double  polysilicon  gate  configuration.  Several  process  changes 
were  made  to  prevent  this  occurrence.  A special  photoresist  masking  step  was 
used  to  cover  all  metal  and  polysilicon  gates  and  interconnects  before  oxide 
cuts  were  made.  The  special  "poly-protect"  masking  step  successfully  pre- 
vented oxide  undercut  as  shown  in  Figure  7-1.  Note  that  severe  oxide  under- 
cut shown  in  Figure  7-lb  resulted  in  broken  metallization  lines  or  poor  step 
coverage,  necessitating  elimination  of  this  problem  by  inclusion  of  the  addi- 
tional masking  "poly-protect"  step. 

a . b . c . 


Figure  7.  Polysilicon  Protect  Configuration 


a.  - Polysilicon  pattern  defined. 

b.  - Oxide  etched  without  "poly-protect"  masking  step, 

showing  undercut  of  polysilicon  gate  - interconnection 
pattern. 

c.  - Oxide  etched  with  "poly-protect"  showning  no  undercut 

of  polysilicon  region,  permitting  smooth  transition  for 
subsequent  metallization  coverage. 


Additional  process  modifications  were  made  to  improve  gate  control;  first 

o o 

gate  oxide  and  second  gate  oxide  thicknesses  was  changed  from  1000A/2000A  to 

o o 

1000A/3000A.  Both  boron  and  phosphorous-doped  second  polysilicon  layers  were 
fabricated;  these  were  gas-phase  doped.  Subsequent  process  improvements  per- 
mitted in-situ  doping  of  polysilicon  with  arsenic,  thereby  reducing  the 
number  of  process  steps. 


The  standard  line  and  separation  distances  were  7.5  microns.  The  most 
advanced  circuit  produced  (DP-3)  was  designed  with  5 micron  line  separations. 
No  major  process  modifications  were  required  by  this  design  rule  change;  the 
same  positive  photoresist  was  used  in  going  from  7.5  microns  to  5 microns. 
However,  adjustments  were  required  in  development  and  exposure  times. 
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7.6  BORON  PENETRATION 

A OP-1  wafer  lot  was  processed  by  doping  the  source  and  drain  regions 
and  polysilicon  Level  II  film  simultaneously.  This  was  an  attempt  to  simplify 
processing  by  eliminating  the  nitride  layer  as  a diffusion  mask.  Very  low 
threshold  voltage  values  were  achieved  and  in  some  instances,  depletion  type 
devices  were  produced.  Boron  penetration  through  the  thin  Si02  gate  insulator 
into  the  Si  substrate  was  suspected.  In  the  worst  case,  "boron  penetration" 
will  form  a heavily  doped  P-type  region  immediately  beneath  the  gate  insulator 
of  the  CCD  devices,  resulting  in  depletion  type  devices.  Subsequent  tests 
verified  this  occurrence.  A common  technique  used  in  the  study  of  impurity 
penetration  through  Si O2  layers  involves  measurement  of  the  depth  of  the 
junction  formed  by  impurity  penetration;  a determination  of  the  time  necessary 
for  penetration  of  the  S i O2  layer  to  a particular  depth  in  the  silicon  can  be 
made. 

It  is  obvious  that  significant  electrical  effects  due  to  boron  penetra- 
tion will  occur  in  CCD  devices  long  before  a pn  junction  can  be  detected  in  the 
silicon  substrate.  At  the  onset  of  boron  penetration  into  the  silicon  sub- 
strate, ionized  boron  atoms  will  act  as  a negative  charge  layer  at  the  Si /Si 02 
interface,  shifting  the  flat  band  voltage  of  the  MOS  structure.  Movement  in 
the  flat  band  voltage  can  be  detected  as  an  equivalent  change  in  MOS  threshold 
voltage.  C-V  measurements  carried  out  on  test  capacitors  clearly  showed  boron 
penetration.  Penetration  was  not  uniform;  it  varied  across  a wafer,  as  shown 
by  variations  in  test  transistor  threshold  voltage  measurements. 

Additional  experimental  work  is  required  to  characterize  boron  penetration 
effects.  This  includes: 

• Using  relatively  low  temperatures  (<  950£’C)  for  all  processes  following 
the  boron  diffusion  step. 

• Using  short  diffusion  cycles  during  boron  doping  of  the  polycrystalline 
film;  this  results  in  producing  higher  sheet  resistance  values  for  poly- 
silicon gates  and  interconnections.  Processing  tradeoffs  must  be  con- 
sidered that  control  boron  concentration  in  the  polysilicon  gate 
structures  versus  boron  penetration  through  the  Si02  film. 

• Performing  boron  diffusions  in  a dry  atmosphere  while  minimizing  low 
temperature  wet  cycles  following  boron  diffusions. 


• Increasing  polysilicon  film  thickness  to  5000A  (3000A  was  used  with 
the  wafers  reported  here). 

The  variety  of  problems  encountered  with  boron  doping  of  polysilicon 
films  resulted  in  a major  processing  change;  polysilicon  films  were  subse- 
quently in-situ  doped  with  arsenic,  thereby  bypassing  the  boron  doping 
problem. 

7.7  THERMAL  OXIDE  PROC  ISSES 

Low  temperature  (920°C)  gate  oxides  were  produced  for  the  DP  device 
series;  these  were  wet  oxides  grown  by  steam  oxidation  of  silicon.  The 
steam  is  produced  by  an  in-situ  reaction  of  H2  and  O2  gasses  (the  oxygen- 
hydrogen  torch  method).  Qss  values  were  typically  5 X 1010  cm-2;  Nss 
(fast  surface  state  density)  was  in  the  10 9 cm-2  range.  Mobile  ion  concen- 
tration was  smaller  than  the  sensitivity  of  the  C-V  measurement  equipment. 
Radiation  hardness  of  these  devices  will  be  evaluated  during  the  next  phase 
of  this  program. 

Silicon  dioxide,  prepared  by  thermal  oxidation  of  silicon  in  oxygen  or 
water  ambients,  generally  will  have  positive  charges  associated  with  it.  As 
a result,  the  underlying  silicon  will  be  depleted  or  inverted  if  it  is  P-type 
or  accumulated  if  it  is  N-type.  These  charges  or  states  may  be  classified  in 
at  least  four  categories.  The  nature  of  these  charges  or  states  in  relation- 
ship to  the  Si 02- Si  interface  structure  is  indicated  in  Figure  7-2.  The 
charges  include:  a)  fixed  surface  state  or  interface  charges;  b)  mobile 
charges  within  the  oxide;  c)  surface  recombination  generation  centers  or 
fast  states;  and  d)  traps  within  the  oxide,  which  can  be  ionized  by  radiation. 


cl  FAST  SURFACE  dl  TRAPS  IONIZED 
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The  fixed  charge  is  apparently  quite  close  to  the  Si02-Si  interface  and 
its  density  can  vary  from  0 to  at  least  2 x 1010  electronic  charges/cm-2. 

The  mobile  charges  are  usually  the  result  of  processing  contamination.  On  the 
other  hand,  deliberate  contamination  can  result  in  values  over  1013  electronic 
charges  cm-'.  The  third  category  does  not  represent  a fixed  charge,  but  rather 
may  be  associated  with  the  often  discussed  "fast  surface  states".  The  density 
of  such  active  surface  states  may  range  from  less  than  1010  cm-'  upwards.  The 
presence  of  these  states  depends  on  processing  conditions,  while  the  silicon 
surface  potential  determines  whether  or  not  they  are  charged.  Positively 
charged  traps  in  the  oxide  have  been  observed  after  exposure  to  X-ray,  electron 
or  other  ionizing  radiation.  The  concentration  of  these  traps  is  of  the  order 
of  1018  cm-3. 

Another  type  of  charge  observed  in  the  study  of  oxidized  silicon  is  that 
on  the  outer  surface  of  the  oxide.  Usually  this  charge  is  a result  of  migra- 
tion in  the  vicinity  of  a biased  junction  or  field  plate.  Such  charge  migra- 
tion requires  a conductive  surface  which  is  usually  caused  by  contamination. 

7.8  METAL  STEP  COVERAGE 

The  metal  step  coverage  problem  was  identified  early  in  the  process  with 
DP-0  lots.  See  Figure  7-3. 


Figure  7-3.  Breaks  in  the  A1  Metallization  DP-0  Design 


I 


The  gulch  or  undercut  field  oxide  of  Figure  7-3  is  shown  again  in  Figure 

0 

7-4.  It  is  caused  during  etching  of  the  1500A  oxide  covering  the  "metal" 
channel  region  and  by  the  steepness  of  the  polysilicon  film  edge.  This  kind 
of  step  is  very  difficult  to  cover  by  A1  metal  1 ization , resulting  in  open  metal 
1 i nes . 


Figure  7-4.  "Gulch"  Formed  under  the  Polysilicon  Film 

It  was  found  at  TRW  that  films  formed  by  thermal  decomposition  of 
tetraethyl  ortho  silicate  (TEOS)  produce  very  smooth  step  coverage  as  shown 
in  Figure  7-5.  To  use  this  Teos  deposition,  an  extra  mask  level  was  inserted 
into  the  process  sequence.  Thus  the  DP-1  circuit  employed  the  added  mask 
level  to  etch  the  oxide  protecting  the  field  oxide  region  where  the  step  cover- 
age problem  occurs. 


Figure  7-5.  "Gulch"  and  Steep  Polysilicon  Step  covered  by  a 

TEOS  Film 

The  resultant  process  involves  depositing  TEOS  over  the  polysilicon  film 
before  the  "poly  protect"  mask  operation.  After  the  "poly  protect"  masking 
step,  the  oxide  in  the  channel  was  etched  out  and  fresh  oxide  regrown.  Figure 
7-6  shows  a metallized  poly  step  covered  with  TEOS,  indicating  good  coverage. 


1 1 
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whereas  Figure  7-7  is  a picture  of  a metallized  poly  step  without  TEOS. 
Metal  discontinuities  can  be  observed  in  this  area;  also  note  the  steepness 
of  the  poly  step,  which  is  responsible  for  the  metal  coverage  problem. 


lystep  covered  with  TEOS 
Desi gn ) 


Figure  7-7.  Metallization  over  poly  step  without  a TEOS  de: ' 

(DP-1  Design) 
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